101 research outputs found

    Reducing Model Complexity for DNN Based Large-Scale Audio Classification

    Full text link
    Audio classification is the task of identifying the sound categories that are associated with a given audio signal. This paper presents an investigation on large-scale audio classification based on the recently released AudioSet database. AudioSet comprises 2 millions of audio samples from YouTube, which are human-annotated with 527 sound category labels. Audio classification experiments with the balanced training set and the evaluation set of AudioSet are carried out by applying different types of neural network models. The classification performance and the model complexity of these models are compared and analyzed. While the CNN models show better performance than MLP and RNN, its model complexity is relatively high and undesirable for practical use. We propose two different strategies that aim at constructing low-dimensional embedding feature extractors and hence reducing the number of model parameters. It is shown that the simplified CNN model has only 1/22 model parameters of the original model, with only a slight degradation of performance.Comment: Accepted by ICASSP 201

    Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression

    Full text link
    Deep learning has been recently introduced for efficient acoustic howling suppression (AHS). However, the recurrent nature of howling creates a mismatch between offline training and streaming inference, limiting the quality of enhanced speech. To address this limitation, we propose a hybrid method that combines a Kalman filter with a self-attentive recurrent neural network (SARNN) to leverage their respective advantages for robust AHS. During offline training, a pre-processed signal obtained from the Kalman filter and an ideal microphone signal generated via teacher-forced training strategy are used to train the deep neural network (DNN). During streaming inference, the DNN's parameters are fixed while its output serves as a reference signal for updating the Kalman filter. Evaluation in both offline and streaming inference scenarios using simulated and real-recorded data shows that the proposed method efficiently suppresses howling and consistently outperforms baselines.Comment: submitted to INTERSPEECH 2023. arXiv admin note: text overlap with arXiv:2302.0925

    Drosophila fasciclinII Is Required for the Formation of Odor Memories and for Normal Sensitivity to Alcohol

    Get PDF
    AbstractDrosophila fasciclinII (fasII) mutants perform poorly after olfactory conditioning due to a defect in encoding, stabilizing, or retrieving short-term memories. Performance was rescued by inducing the expression of a normal transgene just before training and immediate testing. Induction after training but before testing failed to rescue performance, showing that Fas II does not have an exclusive role in memory retrieval processes. The stability of odor memories in fasII mutants are indistinguishable from control animals when initial performance is normalized. Like several other mutants deficient in odor learning, fasII mutants exhibit a heightened sensitivity to ethanol vapors. A combination of behavioral and genetic strategies have therefore revealed a role for Fas II in the molecular operations of encoding short-term odor memories and conferring alcohol sensitivity. The preferential expression of Fas II in the axons of mushroom body neurons furthermore suggests that short-term odor memories are formed in these neurites

    ICDAR 2023 Video Text Reading Competition for Dense and Small Text

    Full text link
    Recently, video text detection, tracking, and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenarios, while ignoring extreme video text challenges, i.e., dense and small text in various scenarios. In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios. Compared with the previous datasets, the proposed dataset mainly include three new challenges: 1) Dense video texts, a new challenge for video text spotter. 2) High-proportioned small texts. 3) Various new scenarios, e.g., Game, sports, etc. The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task 2)). During the competition period (opened on 15th February 2023 and closed on 20th March 2023), a total of 24 teams participated in the three proposed tasks with around 30 valid submissions, respectively. In this article, we describe detailed statistical information of the dataset, tasks, evaluation protocols and the results summaries of the ICDAR 2023 on DSText competition. Moreover, we hope the benchmark will promise video text research in the community

    Urban Treetop Detection and Tree-Height Estimation from Unmanned-Aerial-Vehicle Images

    Get PDF
    Individual tree detection for urban forests in subtropical environments remains a great challenge due to the various types of forest structures, high canopy closures, and the mixture of evergreen and deciduous broadleaved trees. Existing treetop detection methods based on the canopy-height model (CHM) from UAV images cannot resolve commission errors in heterogeneous urban forests with multiple trunks or strong lateral branches. In this study, we improved the traditional local-maximum (LM) algorithm using a dual Gaussian filter, variable window size, and local normalized correlation coefficient (NCC). Specifically, we adapted a crown model of maximum/minimum tree-crown radii and an angle strategy to detect treetops. We then removed and merged the pending tree vertices. Our results showed that our improved LM algorithm had an average user accuracy (UA) of 87.3% (SD± 4.6), an average producer accuracy (PA) of 82.8% (SD± 4.1), and an overall accuracy of 93.3% (SD± 3.9) for sample plots with canopy closures less than 0.5. As for the sample plots with canopy closures from 0.5 to 1, the accuracies were 78.6% (SD± 31.5), 73.8% (SD± 10.3), and 68.1% (SD± 12.7), respectively. The tree-height estimation accuracy reached more than 0.96, with an average RMSE of 0.61 m. Our results show that the UAV-image-derived CHM can be used to accurately detect individual trees in mixed forests in subtropical cities like Shanghai, China, to provide vital tree-structure parameters for precise and sustainable forest management.National Key R&D Program of ChinaNational Natural Science Foundation of ChinaChina Postdoctoral Science FoundationPeer Reviewe

    DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

    Full text link
    Current deep networks are very data-hungry and benefit from training on largescale datasets, which are often time-consuming to collect and annotate. By contrast, synthetic data can be generated infinitely using generative models such as DALL-E and diffusion models, with minimal effort and cost. In this paper, we present DatasetDM, a generic dataset generation model that can produce diverse synthetic images and the corresponding high-quality perception annotations (e.g., segmentation masks, and depth). Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation. We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module. Training the decoder only needs less than 1% (around 100 images) manually labeled images, enabling the generation of an infinitely large annotated dataset. Then these synthetic data can be used for training various perception models for downstream tasks. To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation. Notably, it achieves 1) state-of-the-art results on semantic segmentation and instance segmentation; 2) significantly more robust on domain generalization than using the real data alone; and state-of-the-art results in zero-shot segmentation setting; and 3) flexibility for efficient application and novel task composition (e.g., image editing). The project website and code can be found at https://weijiawu.github.io/DatasetDM_page/ and https://github.com/showlab/DatasetDM, respectivel
    • …
    corecore